Q-DETR: An Efficient Low-Bit Quantized Detection Transformer
29
decoder.5.co_attn.query
decoder.0.co_attn.query
decoder.2.co_attn.query
(b) 4-bit DETR-R50
(a) Real-valued DETR-R50
FIGURE 2.8
The histogram of query values q (blue shadow) and corresponding PDF curves (red curve)
of Gaussian distribution [136], w.r.t the cross attention of different decoder layers in (a) real-
valued DETR-R50, and (b) 4-bit quantized DETR-R50 (baseline). Gaussian distribution is
generated from the statistical mean and variance of the query values. The query in quantized
DETR-R50 bears information distortion compared with the real-valued one. Experiments
are performed on the VOC dataset [62].
(b) 4-bit DETR-R50
(a) Real-valued DETR-R50
FIGURE 2.9
Spatial attention weight maps in the last decoder of (a) real-valued DETR-R50, and (b)
4-bit quantized DETR-R50. The rectangle denotes the ground-truth bounding box. Follow-
ing [169], the highlighted area denotes the large attention weights in the selected four heads
in compliance with bound prediction. Compared to its real-valued counterpart that focuses
on the ground-truth bounds, quantized DETR-R50 deviates significantly.